Evaluation of a deep learning magnetic resonance imaging reconstruction method for synthetic computed tomography generation in prostate radiotherapy

Highlights • Demonstration showed feasibility of combining commercial deep learning for MRI reconstruction and synthetic CT generation.• Differences in synthetic CT generated with and without deep learning MRI reconstruction were clinically negligible.• Commercial deep learning-based MRI reconstruction reduced scan time with 40%.


Introduction
One of the recent developments in radiation therapy (RT) is magnetic resonance imaging (MRI)-Only RT treatment planning.The rationale is to exclude the computed tomography (CT) examination and thereby simplify the workflow and avoid image registration uncertainties between CT and MRI [1,2].The electron density maps required by the treatment planning system for dose calculation are provided by synthetic CT (sCT) images generated from MR images [3].MRI is a slow imaging technique compared to CT, and the MR images are prone to low signal-to-noise ratio (SNR) and motion artifacts.Recently, deep learning-based MRI reconstruction (DLR) has been introduced, with considerable noise reduction and without loss of image quality.The improved SNR can be traded for either improved image resolution or reduced MRI acquisition time.For example, in diagnostic radiology it has been possible to reduce the in-vivo scan time with 50 % for T2weighted (T2w) brain imaging [4].DLR has also been proven successful for prostate and breast imaging by providing substantial noise reduction without any noticeable image blurring or loss of resolution in T2w images [5,6].
The requirements on MR images for use in RT differ in many aspects compared to diagnostic radiology.In RT there is a higher requirement on spatial accuracy, signal stability and image homogeneity and this is especially relevant for MR images used for generation of sCT [7][8][9].Therefore, it would be of interest to apply DLR to MRI data for generation of sCT images of the pelvis, for which a high SNR is desirable, but the acquisition time needs to be limited to keep the effects of physiologic motion in the images on an acceptable level.DLR may be a technique that can improve the MRI-only workflow by reducing the acquisition time without any counteracting factors.In addition, many of the sCT generation products are based on deep learning methods, which are known to be sensitive to the conformity of the input data relative to the training data [10,11].This aspect is of particular interest to study for the sCT generation method in use.
The aims of this study were twofold: 1) to evaluate the compatibility between a commercial DLR product and a commercial sCT generation software, and 2) to validate that the quality of the sCTs remained the same when using a DLR accelerated MRI acquisition protocol for RT.The evaluation was performed on clinical MRI-only based treatment plans for prostate cancer, by quantitatively assessing the Hounsfield Units (HU) and dosimetric integrity of sCTs created from such MR images.

Patient data
Two patient cohorts with prostate cancer patients, treated with ultrahypofractionated RT using a clinical MRI-only treatment workflow [12], were used in this study.Cohort 1 consisted of 24 prostate cancer patients and aimed to investigate the isolated effect of a commercial DLR method on MRI images with respect to sCT generation.Specifically, the GE HealthCare AIR Recon DL (GE Healthcare, Chicago, USA) method was investigated.Cohort 2 consisted of 15 prostate cancer patients and aimed to investigate the effect of the DLR method on sCT generation when Air Recon DL was used to enable an accelerated MRI acquisition protocol.
A fractionation scheme of 42.7 Gy in 7 fractions was planned (Eclipse v.15.6, Varian Medical Systems, Palo Alto, USA) and delivered for all patients using single or dual arc volumetric modulated arc therapy (VMAT) technique with a 6 MV flattening filter free energy on Varian TrueBeam linear accelerators (v2.7,Varian Medical Systems) equipped with the Millenium multileaf collimator.
The study was approved by the regional ethics board "Regionala Etikprövningsnämnden in Lund", diary number 2013/742.No patient data used in this study is openly available.

Data acquisition
Patients in cohort 1 were scanned on a GE Architect 3T (v.29.1, GE Healthcare, Chicago, USA) using a 16 channel GE HealthCare AIR Coil without any coil bridges [13].The raw k-space data from the acquisition of a single large field-of-view (LFOV) T2w scan was used to reconstruct the MR images with the conventional vendor MR image reconstruction method and the GE HealthCare AIR Recon DL method (Air Recon DL mode set to High) (Fig. 1A).Both reconstruction methods are commercially available.Examples of T2w images for the two reconstruction methods are provided in Fig. S2 in Supplementary material.
Patients in cohort 2 were scanned on the same MRI system as cohort 1, using the same coil setup, however the system software was v.30.0.In this cohort the protocol for the LFOV T2w scan was based on the same sequence type (FRFSE) as in cohort 1.To enable undersampling and acceleration with a 40 % time reduction, the protocol was revised in its acquisition parameters compared to cohort 1 and utilized AIR Recon DL as standard image reconstruction (Air Recon DL mode set to High).This was also the default clinical sequence used for sCT generation and the acquisition parameters were: TR 13 600 ms, TE 96 ms, FOV 320 × 480 mm, acquisition matrix 384 × 360, reconstructed matrix 512 × 512, slice thickness 2.5 mm, slice spacing 0 mm, 88 slices, phase encoding direction left-right, number of averages was 1, parallel imaging acceleration factor 2.5, receiver bandwidth 434 Hz/pixel, refocus flip angle 130 degrees, echo train length 15, intensity correction SCENIC, 3D distortion correction on, and total acquisition time of 3:51 min.Screenshot of full vendor specific protocol for cohort 2 can be found in Supplementary material (Fig. S1) where image examples can be found (Fig. S3).Additionally, another separate MRI acquisition for a second LFOV T2w scan was added to this cohort in the same scan session.The acquisition and reconstruction parameters seen in Fig .S1 in Supplementary material were the same as the accelerated clinical protocol in cohort 2 except that it used the conventional vendor-based image reconstruction and had the number of averages in the acquisition set to two, instead of one (double scan time, 7:29 min).Image examples can be found in Supplementary material (Fig. S4).The number of averages of two in the second scan was set to make sure that images in the two scans had comparable SNR.Examples of T2w images for the two scans are provided in Fig. S3 and Fig. S4 in Supplementary material.

Deep learning-based MRI reconstruction
The AIR Recon DL method uses raw complex k-space data as pipeline input to reconstruct MR images.It can replace conventional image filters, and it instead employs a deep convolutional neural network (CNN) to adjust MR image noise, minimize truncation artifacts (Gibbs ringing) and enhance edge sharpness [4].The CNN was trained via supervised learning on 4 million image pairs, contrasting near-perfect quality images with synthesized, lower-resolution versions featuring intentional artifacts and noise.This diverse set, spanning various anatomical structures, was further enriched with augmentations like rotations, flips, and noise for enhanced adaptability and robustness.A vendor-based description of the product is available as a white paper [14].

Synthetic CT generation
The reconstructed T2w images from each reconstruction method in cohort 1 were converted to sCTs using Spectronic MRI Planner (v.2.4.14, Spectronic Medical AB, Helsingborg, Sweden) [12,15], resulting in two sCT volumes sCT_orig and sCT_DL (Fig. 1A).All sCTs had three gold fiducial markers burnt in as spherical structures with high HU in the prostate.This was performed in the MRI Planner software based on the manually identified fiducial center of mass in the T2w image.MRI Planner is based on the transfer function estimation algorithm to calculate a CT representation for an incoming MRI by first estimating the spatially variant coefficients of an affine transfer function and then applying this transfer function to the MRI.The coefficients of the transfer function are generated using a structure of deep CNN [16].sCT generation for data in cohort 2 was performed in the same manner as described above and resulted in two sCT volumes sCT_Acc_DL_1nex and sCT_Acc_orig_2nex (Fig. 1B).Examples of sCT images from both MRI reconstruction methods in cohort 1 and 2 are provided in Fig. S5 and Fig. S6 respectively in Supplementary material.

Data analysis
In cohort 1, the sCT_DL was compared to sCT_orig with respect to sCT bone and soft tissue generation by calculating the difference image through image subtraction, and the volume of the body structure for both sCTs was measured.Associated T2w volumes were subtracted in the same way as the sCTs and visually compared.To assess differences between sCT_orig and sCT_DL in cohort 1, HU differences inside the patient contour (HU range − 990 to 10^13), outside the patient contour (HU range − 10^13 to − 200), in fat (− 200 to − 28.5), in muscle (− 28.5 to 100), in spongy bone (100 to 575) and in compact bone (575 to 10^13) were calculated on a voxel level and analyzed using MICE Toolkit (v.2021.1.0,NONPI Medical AB, Umeå, Sweden).The HU ranges were chosen according to Maspero et al [17] and mean absolute error (MAE) were calculated within each range.
To calculate the total mean dose differences for cohort 1 in planning target volume (PTV), clinical target volume (CTV) and organs at risk, the clinical RT structures, except the body structure, were copied from the sCT_orig to the sCT_DL.The clinical treatment plans, created using sCT_orig, were copied to sCT_DL and the dose was recalculated with the Anisotropic Analytical Algorithm (AAA, v15.6.05) using the same number of monitor units and identical field setup using Eclipse treatment planning software (v.15.6, Varian Medical Systems, Palo Alto, USA).In cohort 1, one patient had a missing treatment plan due to a cancelled treatment course after the MR imaging and was excluded from Notice the homogeneous content of noise and Gibbs ringing artefacts.Also notice that the image differences do not geometrically correspond to the image differences seen for the sCT.b) Absolute value of image in a) with a wider display window for improved visualization.c) Difference image of the two corresponding sCT volumes for a window width of − 15 to 15 Hounsfield units (HU).d) Absolute value of image in c) with a wider window of 0 to 150 HU for improved visualization of peripheral differences.The different window width and level enhance visibility, with higher signal values than the defined scale adopting the color of the highest defined window value.Notice the residual difference in left and right side of the patient for the patient with the largest difference and the small deviation on the patient left side for the patient with the smallest absolute body volume difference (HU scale saturated, true difference value was larger than 150).body volume and dose analysis but included in the HU assessment.The same dose evaluation was performed for cohort 2, where data was copied from sCT_Acc_DL_1nex to sCT_Acc_orig_2nex.
Gamma evaluation using the dose distributions from sCT_orig and sCT_DL was performed for 22 patients in cohort 1 using MICE toolkit (v.2022.4.9, NONPI Medical AB) with global gamma criteria of 1 %/1mm, 2 %/2 mm, 3 %/3 mm, all with dose cutoff thresholds of 0 %, 10 %, 50 % and 90 % of the prescribed dose.One patient out of the 23 could not be evaluated due to software limitations in the evaluation software.However, no relevant dosimetric deviations were detected for the corresponding dose distributions in the treatment planning system for this patient.Corresponding gamma evaluation was performed in cohort 2 using dose distributions from sCT_Acc_DL_1nex and sCT_Acc_orig_2nex.
A two-sided Wilcoxon signed rank test with a confidence level of 5 % was used from the Python SciPy package (v.1.11.1) [18] for the data analysis of the differences in sCT body volume, HU, and dose.

Results
All sCTs in both cohorts were of clinically expected image quality determined by visual inspection.The outer contour depicting and encompassing the pelvis resulted in an image volume approximately 0.1 % lower for sCT_orig compared to sCT_DL in cohort 1: median difference − 15.7 cm 3 , minimum − 27.2 cm 3 and maximum 7.5 cm 3 (n = 23), pvalue <0.001.The patients with the smallest and largest absolute volume differences measured in the sCT images are provided in Fig. 2.
MAE for every HU interval (sCT_orig-sCT_DL) is reported as the average HU over all subjects in cohort 1 (n = 24) together with ± of the standard deviation (1 SD) and [minimum maximum]: Inside the patient contour 2.9 ± 0.4 [2.The mean total dose differences for targets and organs at risks (sCT_orig-sCT_DL) in cohort 1 were all positive and below 0.06 Gy, which corresponds to a 0.1 % difference with respect to the prescribed dose of 42.7 Gy (n = 23, Fig. 3A).Individual p-values were <0.001 for all structures.Corresponding dose difference in cohort 2 (n = 15, sCT_Acc_DL_1nex -sCT_Acc_orig_2nex) also showed differences below 0.06 Gy (n = 15, Fig. 3B).Individual p-values for bladder, rectum, CTV and PTV were 0.004, 0.009, 0.001, and 0.003, respectively and >0.28 for the remaining structures.Fig. 3. A. Boxplot of mean total dose differences for clinical target volume (CTV) and planning target volume (PTV) and organs at risk (sCT_orig-sCT_DL) in cohort 1 (n = 23).Prescribed dose was 42.7 Gy to planning target volume (PTV).sCT_orig defined the image geometry for the created treatment plan which was copied to sCT_DL and recalculated.B. Boxplot of mean total dose differences for clinical target volume (CTV) and planning target volume (PTV) and organs at risk (sCT_Acc_DL_1nex -sCT_Acc_orig_2nex) in cohort 2 (n = 15).Prescribed dose was 42.7 Gy to planning target volume (PTV).sCT_Acc_DL_1nex defined the image geometry for the created treatment plan which was copied to sCT_Acc_orig_2nex and recalculated.
The gamma pass rates in cohort 1 were 99.8 % (n = 22) or above for all evaluated dose and distance criteria above 0 % of prescribed dose.Corresponding results for cohort 2 was 99.5 % (Table 1).

Discussion
DLR is a new method that has been proven to reduce the MR image noise considerably [4].As a result, in clinical diagnostic radiology the method is implemented with shorter examination times or improved image quality or both [5,6].In this study DLR was applied to MR images that were acquired for the purpose of generating sCT using another deep learning-based commercially available method.The combination of these products composed a unique combination of two serially linked commercial deep learning-based products.Our results show that the MR images reconstructed with DLR could be used for sCT generation without sacrificing the sCT integrity.The comparison revealed only minor, clinically negligible, differences in HU and calculated dose.This study also demonstrated a 40 % reduction in MRI acquisition time for sCT generation.This can potentially result in reduced intra-imaging patient motion, hence improved accuracy regarding target delineation and identification of e.g., intra-prostatic fiducial markers.
To the best of our knowledge, this is the first study using deep learning to reconstruct MR images which are subsequently used for deep learning based sCT generation.It is therefore difficult to compare our results to previous studies.The performance of a previous release of MRI Planner has been validated against CT for male pelvis [12,15], where the overall mean dose differences between sCT and CT dose distributions were below 0.3 %.The corresponding dose differences in this study were below this level (Fig. 3A).Evaluation of sCT images for head and neck treatments using MRI Planner with a similar software version has also been performed with a reported MAE of 67 ± 14 HU within the body contour [19].The MAE results presented in this study are lower, with the largest MAE of 14.2 HU for compact bone.From recent review papers on deep learning methods for pelvis sCT generation, a MAE between 32 and 51 HU was reported in one paper [20] while an average MAE of 42.4 HU was reported in another review [11].These errors are larger than ours, however one important aspect must be taken into consideration: In our study, a comparison of results using either sCT or CT is not performed, as the results originate from sCTs generated with and without DLR.The comparison in this study does not rely on different image scan sessions, and it is therefore expected that the differences encountered are smaller than otherwise found in the literature.
The observed MAE and ME in HU and observed dose differences for both cohorts in this study were low and can be considered clinically negligible.However, the dose differences in cohort 1 showed a systematic offset, i.e., the calculated dose was always higher on the original sCT compared to the DL sCT for all targets and organs at risk (Fig. 3A).This was most probably due to the larger outer contour of sCT_DL resulting in a larger body volume compared to sCT_orig.This is supported by the statistically significant results regarding body volume measurement and visually demonstrated in Fig. 2. We therefore have strong evidence to believe that the larger body volume of the sCT_DL (approximately 0.1 %) was due to decreased Gibbs ringing artifacts and reduced MR image noise, due to the DLR.In addition, a change in the body volume could not be visually observed for the MRI volumes, suggesting that it rather was the slightly different image quality around the body-to-air boundary in the MR images that was affecting the sCT generation.
The statistically significant differences in total dose differences for bladder, rectum, CTV and PTV in cohort 2 (Fig. 3B) were probably the result of the structure propagation when RT structures were copied from the clinical image geometry (sCT_Acc_DL_1nex) to the comparing geometry (sCT_Acc_orig_2nex) in combination with patient relaxation and compression during the MR scan session [13].This relaxation effect has also been observed in previous sCT studies [15].
The investigations performed in our study focused on two effects.The compatibility between two deep learning-based commercial products, i.e. using DLR for MR images as input to a sCT generation software in cohort 1.A strength of this part of the study was that identical MRI kspace data was used in both MR image reconstructions.Thereby, the effects of the DLR method were studied in isolation, with no other confounders.The second effect investigated was the acceleration of the MRI acquisition, enabled by using DLR on data in cohort 2, which allowed for a much shorter scan time with retained MR image SNR and without compromising the sCT quality.The clinical usefulness and value were demonstrated for a revised acquisition protocol and the benefits from using DLR was proven by scan time reduction from 7:00 to 3:51 min (40 % reduction).A limitation of our study was the specific set-up, which does not allow for a generalization of the results to other MRI vendors with other DLR methods.
This study concludes that the integration of two deep learning-based commercial products designed for MRI reconstruction and sCT generation is both compatible and suitable for generating sCT in the pelvic region.Additionally, deep learning-based MRI reconstruction facilitates a reduction in the required MRI scan time for the acquisition sequence intended for sCT generation.

Availability of data and materials
The present data is summarized in this paper and Supplementary materialsl.

Ethics approval and consent to participate
The study was approved by the regional ethics board "Regionala Etikprövningsnämnden in Lund", diary number 2013/742.

Consent for publication
Not applicable.

Table 1
Mean gamma pass rate results in % for distance and dose criteria for multiple dose thresholds in cohort 1 (n = 22) comparing dose calculated on sCT_orig to dose calculated on sCT_DL and cohort 2 (n = 15) comparing dose calculated on sCT_Acc_DL_1nex to dose calculated on sCT_Acc_orig_2nex.

Fig. 1 .
Fig. 1. A. Data workflow for the patients in cohort 1 using two parallel image reconstruction pipelines on the same k-space data, one with conventional reconstruction (top row in A) and one with the GE HealthCare AIR Recon DL reconstruction (bottom row in A).B. Data workflow for the patients in cohort 2 using two image reconstruction pipelines, each one with unique k-space data, one with GE HealthCare AIR Recon DL reconstruction (clinical protocol, one average acquired, top row in B) and one with the conventional reconstruction (two averages acquired, bottom row in B).Images presented in B are symbolic representations and might not represent actual patient data.

Fig. 2 .
Fig.2.The figures represent differences between MRI using conventional reconstruction and DLR and their corresponding sCT in cohort 1.The left-hand side originates from the patient with the largest absolute value of body volume difference (27.2 cm 3 ), and the right-hand side originates from the patient with the smallest absolute value of body volume difference (3.5 cm 3 ).The images represent the following: a) Difference image between MR volumes with conventional and DLR reconstruction.Notice the homogeneous content of noise and Gibbs ringing artefacts.Also notice that the image differences do not geometrically correspond to the image differences seen for the sCT.b) Absolute value of image in a) with a wider display window for improved visualization.c) Difference image of the two corresponding sCT volumes for a window width of − 15 to 15 Hounsfield units (HU).d) Absolute value of image in c) with a wider window of 0 to 150 HU for improved visualization of peripheral differences.The different window width and level enhance visibility, with higher signal values than the defined scale adopting the color of the highest defined window value.Notice the residual difference in left and right side of the patient for the patient with the largest difference and the small deviation on the patient left side for the patient with the smallest absolute body volume difference (HU scale saturated, true difference value was larger than 150).